Week 07: Code Generation & Operational Semantics
Code Generation
Introduction
Goal
Generating code for a stack machine with the accumulator. Run the code on a real machine, e.g., the MIPS processor.
Concept
- The accumulator is kept in MIPS register
$a0.
- Keeps the stack in memory.
- The address of the next location on the stack is kept in MIPS register
$sp
- The top of the stack is at address
$sp + 4
MIPS
We will compile the source code from Cool to MIPS assembly to execute it.
MIPS
- RISC
- Operation uses registers for operands and results.
- Use
load and store instructions to manipulate the value in memory.
- 32 general purpose registers x 32bits per register
MIPS Instruction
lw reg_1 offset(reg_2): Load 32-bit word in address reg_2 + offset into register reg_1
sw reg_1 offset(reg_2): Store 32-bit word in register reg_1 into address reg_2 + offset
add reg_tar reg_src reg_src2
addiu reg_tar reg_src imm: reg_tar = reg_src + imm (u means overflow is not checked.)
li reg imm: reg = imm
beq reg1 reg2 label: If reg1 = reg2, then jump to label.
b label: Jump to label.
Sample for 7+5
acc <- 7
push acc
acc <- 5
acc <- acc + top_of_stack
pop
li $a0 7
sw $a0 0($sp)
addiu $sp $sp -4
li $a0 5
lw $t1 4($sp)
add $a0 $a0 $t1
addiu $sp $sp 4
Code Generation
How it works
For expression e, generate MIPS code
- Compute the value of
e in $a0
- Preserve
$sp and the contents of the stack
Define a code generation function cgen(e) whose result is the code generated for e.
The code in source
cgen(e1 + e2) =
cgen(e1)
print "sw $a0 0($sp)"
print "addiu $sp $sp -4"
cgen(e2)
print "lw $t1 4($sp)"
print "add $a0 $t1 $a0"
print "addiu $sp $sp 4"
Remarks
- Before entering the another cgen function, you should not save anything in the temporal register. Because the temporal register may be used after. Just use the stack machine.
- Stack machine code generation is recursive.
- Code generation can be written as a recursive-descent of the AST.
- In different language, code for function calls and function definitions depends on the layout of the AR.
Simple AR for Cool
- The result is always in the accumulator.
- The activation record holds actual parameters.
- The stack discipline guarantees that on function exit
$sp is the same as it was on function entry.
On function exit
- We need the return address to jump back.
- A pointer to the current activation is useful.
- This pointer lives in register
$fp (frame pointer)
- When entering a call to
f(x,y) the AR of f will be pushed into the stack:
- old fp
- y
- x
- Return Address
How do we manage the variable's location
- Problem: Because the stack grows when intermediate results are saved, the variables are not at a fixed offset from
$sp
- Solution
- Tutor: use a frame pointer
- I use another stack in code gen to save and search the variables in the stack.
- load from
$sp + (Stack_Size - Variable_Index)
- The activation record must be designed together with the code generator
- Code generation can be done by recursive traversal of the AST
Temporaries
How do we estimate the stack's size?
Basic
Let NT(e) = # of temps needed to evaluate the expression e
NT(e1 + e2)
- Needs at least as many temporaries as
NT(e1)
- Needs at least as many temporaries as
NT(e2) + 1
- Space used for temporaries in
e1 can be reused for temporaries in e2
NT(e1 + e2) = max(NT(e1), 1 + NT(e2))
NT(e1 - e2) = max(NT(e1), 1 + NT(e2))
NT(if e1 = e2 then e3 else e4) = max(NT(e1),1 + NT(e2), NT(e3), NT(e4))
NT(id(e1,...,en) = max(NT(e1),...,NT(en))
NT(int) = 0
NT(id) = 0
Function
For a function definition f(x1,...,xn) = e the AR has 2 + n + NT(e) elements
- Return Address
$ra
- Frame Pointer
$fp
- n arguments
NT(e) locations for intermediate results
Tempories in AR
- old fp
- xn
- ...
- x1
- RA
- Temp NT(e)
- ...
- Temp 1
Remarks
- Code generation must know how many temporaries are in use at each point
- Add a new argument to code generation
- the position of the next available temporary
- The temporary area is used like a small, fixed-size stack
Object Layout
Basic
- OO implementation = Basic code generation + More stuff
- If B inherits A, code in class A works unmodified for an object of class B
- Each attribute stored at a fixed offset in the object
- The attribute is in the same place in every object of that class and inherited classes.
Object Layout
- Offset +0: Class Tag (Int)
- Offset +4: Object Size (Int)
- Offset +8: Pointer to Class Method Table (Address)
- Offset +12, 16, ...: Pointer to Attribute (Address)
- If C inherits B and B inherits A, then the sequence is A's attributes, B's attributes, and C's attributes.
Class Method Table Layout
- If C inherits B and B inherits A
- A contains f1, f2, f3
- B contains f2, f3, f4, f5
- C contains f3, f5, f6
- Class C Method Table Layout
- A's method f1: A.f1
- A's method f2 overrided by B: B.f2
- A's method f3 overrided by C: C.f3
- B's method f4: B.f4
- B's method f5 overrided by C: C.f3
- C's method f6: C.f6
Cgen Semantics
Semantics Overview
Rules in every step
- The tokens $\Rightarrow$ lexical analysis
- The grammar $\Rightarrow$ syntactic analysis
- The typing rules $\Rightarrow$ semantic analysis
- The evaluation rules$\Rightarrow$ code generation and optimization
- The compilation of Cool to a stack machine.
Consideration in Assembly-language description
- Whether to use a stack machine or not
- Which way the stack grows
- How integers are represented
- The particular instruction set of the architecture
How to specify evaluation rules?
- Many ways to specify semantics
- Operational Semantics (What we use in Cool)
- Describes program evaluation via execution rules
- Most useful for specifying implementations
- Denotational Semantics
- Program’s meaning is a mathematical function
- Axiomatic Semantics
- Program behavior described via logical formulae
- If execution begins in state satisfying X, then it ends in state satisfying Y
- X, Y formulas
- Foundation of many program verification systems
Operational Semantics
TL; DR
Similar to typing rules. Read Coool Manual Ch.13 Operational semantics.